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Abstract. — The problem of estimating the probability p = P(g(X) < 0) is considered when X 
represents a multivariate stocha stic input of a monoto nic function g. First, a heuristic method to 
bound p, originally proposed bv lde Rocquignvl j2009h . is formally described, involving a special- 
ized design of numerical experiments. Then a statistical estimation of p is considered based on a 
sequential stochastic exploration of the input space. A maximum likelihood estimator of p based 
on successive dependent Bernoulli data is defined and its theoretical convergence properties are 
studied. Under intuitive or mild conditions, the estimation is faster and more robust than the 
traditional Monte Carlo approach, therefore adapted to time-consuming computer codes g. The 
main result of the paper is related to the variance of the estimator. It appears as a new baseline 
measure of efficiency under monotonicity constraints, which could play a similar role to the 
usual Monte Carlo estimator variance in unconstrained frameworks. Furthermore the bias of the 
estimator is shown to be corrigible via bootstrap heuristics. The behavior of the method is illus- 
trated by numerical tests led on a class of toy examples and a more realistic hydraulic case-study. 

On considere l'estimation de la probabilite p = P(</(X) < 0) ou X est un vecteur aleatoire 
ct g une fonction monotone. Premierement, on rappelle et formalise une methode, proposee 
par de Rocquigny (2009), permettant d'encadrer p par des bornes deterministes en fonction 
d'un plan d'experience sequentiel. Le second et principal apport de l'article est la definition et 
l'etude d'un estimateur statistique de p tirant parti des bornes. Construit a partir de tirages 
uniformcs successifs, cet estimateur presente sous de faibles conditions theoriques une variance 
asymptotique plus faiblc et une mcillcure robustesse que l'estimateur classique de Monte Carlo, 
ce qui rend la methode adaptee a l'emploi de codes informatiques g lourds en temps de calcul. 
Des experimentations numeriques sont menees sur des exemples-jouets et un cas d'etude hy- 
draulique plus realiste. Une heuristique de boostrap, reposant sur un replicat de l'hypcrsurfacc 
{x, g(x) = 0} par des reseaux de neurones, est proposee et testee avec succes pour oter le biais 
non-asymptotique de l'estimateur. 
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1. Introduction 



In many technical areas, the exceedance of some unidimensional variable Z over a certain 
critical value z* may define an event of probability p which has to be carefully estimated. 
Assumed to be stricly positive, p can be defined by 



P 



P(g(X) < 0) = / l {ff(x) < 0} /(x) rfx 
Jv 



with X a random vector of uncertain input parameters with probability density function (pdf) 
/, taking its values in a d— dimensional space U, and g(X) = z* — Z a deterministic map- 
ping from U to 1R. This framework is often encountered in structural reliability studies 
( Madsen fc Ditlevsenl . [l996h . when g is a computer code reproducing a physical phenomenon. 
A Monte Carlo (MC) method is the usual way to estimate p, by p n — ^ _1 X)fe=i l{s(x fc )<o} 
with large n independently sampled from /. Avoiding regularity hypotheses on g, this unbi- 
ased estimator presents good convergence properties and an estimation error independent on d. 
Unfortunately, this strategy often appears inappropriate in practice when p reaches low values, 
since g can be time-consuming and the computational budget may b e limited: a good estimatio n 
of a probability p ~ 10~ 9 typically requires at least 10 9+2 calls to g ( Lemaire &: Pendolal 2006 ). 
Furth ermore, p n has the theoretical defect not to be robust, in the sense given bv lGlvnn et al. 
( 2009f ): its relative error, namely its coefficient of variation, does not tend to a finite limit when 
p — > + , given any finite number n of trials. 

Many non-intrusive strategies have been proposed to accelerate the MC approach. Tradi- 
tional methods from the engineer community in structural reliability (FORM/SORM) treat 
the estimation of p as an optimization problem. The computational work is usually fast but 
the estimators suffer from weakly or non-controllable errors. Statistical approaches are judged 
in terms of reduction rate with respect to the MC estimator var iance Var|p n ] = p(l — p)/n . 
Methods like quasi-MC, sequential MC or importance sampling ( Kroese fc Rubinsteinl . [2007^) 
are based on selecting a design of experiments (DOE), namely a set of points in U on which g is 
tested, such that U be explored in areas close to the limit state surface 5={xeHJ; g(x) = 0}. 
Most advanced methods often get rid of the time- consuming difficul t ies by emulating the be- 



2003) which presuppose 



havior of g, for instance using kriging techniques (jCannamela et 
smoothness conditions on g. 

Minimizing the strength of regularity hypotheses placed on g underlies the development 
of specialized acceleration methods. For instance, computer code s can suffer from edge effects 



which restrict smoothness conditions (IMunoz-Muniga et aZ.l . l2011 ). On the other hand, the real 



ity of the phenomenon can imply various form constraints on Z . Especially, the as sumption tha t 
g is monotonic with respect to X is a standard problem in regression analysis (jDuroti . 120081 ). 
In the area of numerical experiments, monotonicity properties of computer codes have been 
considered theoretically and practically, e.g . proving the MC ac celeration of Latin Hypercube 
Sampling for the estimation of expectan cies ( MacKav et al . 1979t ). carryi ng out screening meth- 
ods for sensitivity an alyses (jLinl . Il993t ). constraining response surfaces ( Kleiinen &: van Beersl 
20091 : Kleiinen . 2011 ). predicti ng the behavior of ne twork queuing systems ( Rani an et all . 2008 ). 
computing floo d probabilities ( de Rocquignv , 20091) or estimating the safety of a nuclear reactor 
pressure vessel (jMunoz-Muniga et all 2011 ). 

Specific engineering works in structural reliability have highlighted the possibility of bound- 
ing and estimating p significantly faster than using a MC approach. Under the name of 
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monotonia reliability methods (MRM), Ide Rocquignv (l2009h proposed a class of sequential 
algorithms contouring the limit state surface and enclosin g p between determ i nistic bounds 
which dynamically narrow. A similar idea was explored by iRaiabalineiad et ai I (l201lh. How- 



ever, although a parallclization of such algorithms was already implemented (jLimbourg et al 
201dh , these methods were only empirically studied and some of the proposed estimators of p 



remained crude. 



The present article therefore aims to provide a first theoretical approach of the accelerated 
MC estimation of p when g is assumed to be monotonic and possibly discontinuous, although 
some smoothness constraints are assumed on the failure surface S. More precisely, this article 
is structured as follows. 

Section 2 is dedicated to a general description and a mathematical formalization of MRM. 
The main contribution is presented in Section 3: a statistical estimator of p is proposed, based 
on uniformly sampled DOEs in nested spaces. Defined as the maximum likelihood estimator 
of dependent Bernoulli data, its asymptotic properties arc theoretically studied. The estimator 
is shown to be robust, and its variance gains a significant reduction with respect to the usual 
MC case. It may also be viewed as a baseline (or target) variance for monotonic structural 
reliability frameworks. The non-asymptotic bias of the estimator is examined in Section 4, 
through numerical experiments involving a class of toy examples. Based on a neural network 
emulation of <S, bootstrap heuristics are proposed and successfully tested to remove this bias. 
Finally, a more realistic hydraulic case-study illustrates the benefits of the complete method. 

Along the paper some connections are done with other areas of computational mathematics, 
especially about implementation issues, and a discussion section ends this article by focusing 
on the research avenues that must be explored in the area of stochastic sequential DOEs to 
improve the results presented here. 



2. Material 

2.1. Working assumptions, definitions and basic properties 

Let j : X 4 be a deterministic function defined as a real-valued scalar mapping of 

X = (Ai, . . . , Xd) on its definition domain U C M d . Deterministic means that the function 
<?(x) produces always the same output if it is given the same input x. Global monotonicity is 
defined as follows: Vi, 3s{ e { — 1, +1}, Ve > 0, Vx = (xi, . . . , Xd) € U, such that 

g (x\ , . . . , Xi — i , Xi -\- 5j£, Xi-\-\ , . . . , Xd) g {x\ , . . . , Xi — i , Xi, Xi_|_i , . . . , Xd) 

where Si represents the sign of monotonic dependence: Si = 1 (resp. Sj = —1) when g is 
decreasing (resp. increasing) along with the i— th component Xi. The following assumption is 
made without loss of generality since any decreasing i— th component can be changed from xt 
to —x^. 

Assumption 1. — The function g is globally increasing over U. 

To be general, U = [0, l] d and X is a random vector defined on the probability space 
(U, £>(U), P). Next assumption is made following this same concern of generality. 
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Assumption 2. — All inputs x\, . . . ,Xd are independently uniform on U = [0, l] d . 

In real cases, x\, . . . , xg can be den ned as transformed inputs, as usual in structural safety 
problems ( Madsen k, Ditlevsenl . Il996 ). In such cases one can write x = T(y) where y = 



(?/i , . . . , yd) is a vec tor of physical inputs and T is the multivariate distributional transform 
(jRuschendorJ . 2009). Therefore g = g a T 1 where g is a mononotic function and T has 



to preserve this monotonicity. When the yi are independent, T is reduced to the vector of 
marginal cdfs . . . , Fj) and is naturally increasing, so the assumption is not restrictive. 
Else, tech nical requirements o n T are needed, which depen d on the way this joint distribution 
is defined ( Ruschendor A 12009^ . See for instance [Chenl ( 20091) for such requirements on Gaussian 



copulas. Another general result is given in the Appendix (Supplementary Material). 

Assumption 3. — Both subspaces = {x g U, g(x) < 0} and U + = {x G U, g(x) > 0} 
are not empty (so that p exists in ]0, 

Definition 1. — A set of points of U is said to be safety- dominated (resp. failure- dominated) 
if g is guaranteed to be positive (resp. negative) in any point of this set. 

Denote by y the partial order between elements of U defined by x y y Xk > j/fc Vfc = 1, . . . , d. 
Then assume that some point value <?(x) is known, and consider the sets = {x 6 U | x y x} 
and = {x € U | x X x}. The increasing monotonicity implies that if <?(x) > (resp. 
g(Z) < 0), then is safety-dominated (resp. is failure-dominated). This proves next 
lemma. 

Lemma 1. — Both inequalities are true with probability 1: 
p < l-P(XeUt) if. 9 (x)>0, 
p > P(X e Ur) else. 

More generally, assume that n input vectors (x.j)j=i n can be sorted into safe and failure sub- 
samples following the corresponding values of {g(^-j)}j=i n - They are respectively defined 

by 

5+ = {XG {Xj)j=i,..., n I ff(Xj) > 0} 
and 

5- = {x e (Xj)j=l,...,n I ff(x 3 ) < 0} . 
Then one may define the sets 

U+ = {xeu|3x 3 -es+ xhx 3 ], 
U- = {xeD|3x i6 3- x^x,} 

(see FigureQ]for an illustration). Finally, denoting p~ = P(X e U~) and p+ = 1 — P(X G U+) 
to alleviate the notations, one has in all the sequel and for all n > 0, 

Pn < P < Pt- (!) 

Hereafter, and will be referred to as dominated subspaces, where the sign of <?(x) 
is known. Note that the complementary non-dominated subspace U„ = U/ (U^ U U~) is the 
only partition of U where further calls of g are required to improve the bounds. Finally, a 
topological assumption on S is nee ded to complete the fo rmal description of the situations 
studied by Ide Rocquignv (2009) and lLimbourg et Ml (<2010l ). 
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Assumption 4. — The limit state surface S = {x G U ; g(x) = 0} is regular enough and 
separates U in two disjoint domains U~ and TU + (simply connected). 

The second part of this assumption implies that, in terms of classification, the two classes 
of points U~ and are perfectly separable when n — > oo. This property will be used later 
in the paper to carry out bootstrap heuristics. By regular enough, S is assumed not to be the 
surface of multidimensional stairs, so that it cannot be exhaustively described by a n— DOE 
with ?i < oo. This mild assumption is ensured, for instance, if g is continuously diffcrcntiablc 
on a non-empty measurable subset of S. More formally, it is assumed that V?i < oo, 



sup / !{xr<x n } cfoc < p-p n _ x , (2) 

x„e5JB„-iniJ- 



sup / l{i_ x xi- x „} dx < p+_ x -p. (3) 

x„es Jo„-inu+ 

This will imply that p~ < p < p^ and the finiteness of the strictly positive quantity Gj n+ \{jp) = 
[(Pn — P)(p ~ Pn)}^ 1 encountered further in the paper. 

Remark 1. — In multi-objective optimization, a dominated space can be interpr eted as a 



subset of a performance space delimited by a Pareto frontier (jFigueira et all 120051 ). In this 
framework, g is thought as a monotonic rule of decision depending of d variables, for which the 
set of n best possible configurations (the frontier) is searched. 

Remark 2. — The proportions (p~ , 1 — p+) are the volumes of two unions of hyperrectangles 
sharing the same orthogonal basis. Computing such volumes is known i n computational geome- 
try a s Klee's measure problem, for which recursive sweepline algorithms ( van Leeuwen fc Woodl . 



1 1 9 8 lh can provide exact solutions. Details about their implementation are given in Appendix 
(Supplementary Material). When d exceeds 4 or 5, these exact methods appear however too 
costly, and trivial MC methods must be preferred in practice to compute these quantities. 

2.2. MRM implementation: a one-step ahead strategy 

Starting from Uq~ = Uq = {0 d } and Uo = U = [0, l] d , the iterative scheme shared by 

all MRM variants at step n > 1 is based on: 



1. selecting a DOE {x£\. . . ,xl m ' ,) } € U n _i; 

(7) 

2. computing the signatures £x = 1 r / un \\ 

3. updating the subspaces 

U.+ = U+^ujxGUlaxW, eg)=0, x^x«}, 

U n = U/(U"UU+) 

4. updating the bounds {p~,p+} = {Vol(U^), 1 - Vol(U+)}. 
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Since U~ C U~ +1 Vn > 0, then P(X € TU~) < P(X G U~ +1 ) and the sequence (p~) is 
nondecreasing. Symmetrically, the sequence (pX) is nonincreasing. Since bounded in [0,p] and 
[p, 1], both sequences are converging. 

At each step, the DOE must be chosen accounting for the increasing monotonicity of g. 
Denoting xj, 1 ' and xj, 2 ' two elements of the DOE and assuming to know £4„ , it is unnecessary 
to compute ^4„ in two cases: 

if = 1 and x^ ^ x^ => x^ € T£T W and = 1, 
if = and xW r< x^ ^> x^ 2) G U+ (1) and = 0. 

Thus the order of trials should be carefully monitored, in relation with the partial order between 
the elements of the DOE. Reducing the DOE to a single element, i.e. m n = 1 for all steps, 
minimizes the number of unnecessary trials. This one-step ahead strategy is favored in the 
present paper. 



2.3. Stochastic MRM 

Initialization. - First iterations should be monitored to reduce significantly the width of 
[Pn>PnL such that further iterations mainly focus on refinements. A deterministic strategy 
seems the most appropriate to start from [0, 1] until providing non-trivial bounds. A dichotomic 
diagonal MRM, illustrated on Figure [5] in a two-dimensional case, was used in the examples 
considered further. It explores the non-dominated space in an intuitive way and stops at step 
ko > 1 such that 

log(l/p) 

k > l + -jj 7T- 

Gtl0g2 

Consequently, an expected crude prior value of p can help to estimate the minimal number ko of 
trials. To alleviate the paper, the notation (U^~,U^~,pJ,p^~) now describes the situation after 
N — 1 introductive deterministic steps with TV > ko + 1 , such that < p^ and p$ < 1. 

Switching to stochastic DOEs. — Pursuing a deterministic strategy can be too c ostly to be effi- 



cient , the upper bound p^ offering possibly a very conservative assessment of p Ijde Rocauignv , 
l2009h . Intuitively, such a strategy should be optimized by selecting the next element of the DOE 
as the maximizer of a criterion which predicts a measure of dominated volume. Apart from 
the difficulty of predicting, choosing the criterion remains arbitrary. Switching to a stochas- 
tic strategy, which allows for a sequential statistical estimation of p in addition of providing 
bounds, seems a promising alternative approach. In this framework, 

^ fn—1 

at each step n > 1, with f n -i a pdf defined on U„_i. Then the probability space (U,B(U), P) 
becomes endowed with the filtration F = (J^n) where JF n is the a— algebra generated by a 
n— sequence. The sequences (pq,...,p~) and (j>q, ■ ■ ■ , p%) become monotonic and bounded 
stochastic processes with dependent increments. 

Uniformly sampled DOEs. — The remainder of this article is devoted to a baseline statistical 
estimation of p in a monotonic framework, in a similar spirit to the MC approach in uncon- 
strained frameworks. Therefore, in the following, the sampling is chosen uniform at each step: 
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Figure 2. Diagonal deterministic (DD-MRM) strategy, assuming a low p, stopping 
after 4 steps. 
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3. A maximum likelihood estimator of p 

Assume that Xi , . . . , x n are successively uniformly sampled in the nested non-dominated 
spaces Uo, • ■ • , U„_i. Next lemma follows. 

Lemma 2. - PniPn P- 

In corollary any estimator of p located between the bounds is strongly consistent. Especially, 
any crude average of the bounds gains a statistical validity. A more sophisticated approach can 
be carried out by noticing that, at step k, the occurence of a nonzero signature £ Xk follows a 
Bernoulli distribution B(jk) conditionally to J-k-i, with 
7fc = P(>(x)<0|xeUfc_i), 

P (g(x) < 0) - P (, 9 (x) < 0|x £ U^) P (x £ U^) 
P(xeTLVi) 

from Bayes' formula, hence 

Ik = — — ■ (4) 

Pk-i-Pk-i 

After n steps, all information about p is brought by the dependent-data likelihood L n {p) = 
£n(p|xi, • • • , x n ) defined by the product of these conditional Bernoulli pdf: 

k=l \ p k-l - Pk-l ) \pk-l - Pk-l J 

the maximum estimator (MLE) p n of which is considered in next proposition. 

Proposition 3.1. — Denote £ n (p) = log L n (p). There exists a unique and consistent solution 
p n in \Pn-nPn-x[ °f tne likelihood equation £' n (p) = J2k=i^ k (p) (P k ~ p) = ^, such that 

n 

£ 0J k (p n )Pk 

Pn = ^ , (6) 

E ^k (Pn) 
fe=l 

With U> k (p) = ((p ~ Pk-l) (Pk-l ~ P)) 1 and Pk = Pk-l + {Pk-l -Pfc-lKx k 

Assumption 4 ensures the existence of p n since, by and ([3]), p cannot be reached by 
at least one of the two bounds (Pn-nPn+i) ^ or an y nn ^ c n - Similarly, the quantities defined 
in next propositions remain finite if the limit state surface S has mild smoothness properties. 
They are related to the behavior of the inverse of the Fisher information associated to (JSJ) , 
which converges to faster than the variance of the usual MC n— estimator 

n 

Lemma 3. — Assume that S is such that @ and © hold {Assumption 4)- Then, Vn > 0, 
E[l/(p-p-f] < oo, (7) 
E[l/(p+-p) 2 ] < oo, (8) 
and consequently E[w„ + i(p)] < oo. 
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Proposition 3.2. — Denote J n (p) the Fisher information associated to ([5]). Then 



JuHp) 



5>[d> fc (p)] < v n MC ( P ) 



fc=i 



(9) 



where Co = and V k > 1, 

Pfc , !-Pfc Pfc(l-Pfc) 
P 1 -P 



E 



P(l -P) 



Proposition 3.3. — Denote 70 = [(p^ — p )/p ] 2 . Then 



(10) 



In this data-dependent context, the central limit Theorem 13.11 remains classical in the sense 
that the Cramer-Rao bound given by the inverse of the Fisher information is asymptotically 
reached by the MLE. It is technically based on the martingality of the score process n n- 
{i' n (p)}n- Therefore inequalities and (|10[) imply asymptotic variance reduction with respect 
to Monte Carlo and robustness. From (JTDJ), the asymptotic coefficient of variation (CV) of the 
MLE is such that 

P /To 00 Ffo 
n 



CY[p r , 



< 



E 



Pn-l 




Theorem 3.1. — Let (A„) be any deterministic sequence in ]0, 1[ such that A„ 
the supplementary assumptions: 



1. Under 



1 n 

(i) : — ^ (Ofc(p) - E [« fc (p)]) A for any 5 > 1. 



(ii) 



fe=i 

Pn -P 
P-Pn 



Pn~P JP V n , Pn-P JP V n .,, _ f , . s„ . . 

(111) : — r and — — > with p„ = (1 — \ n )p n + X n p 



Pn -p 



P-Pn 



then 



Jn /2 (P) (Pn-P) 4^(0,1). 



(11) 



The law of large numbers (i) reflects the requirement that the sum of weights Cjkijp) cannot 
diverge faster than 0(J n (p)) from its mean behavior when n — > 00. Although difficult to check 
in practice, this behavior seems rather intuitive because the sampling trajectories mainly vary 
at the first steps of the algorithm, when the non-dominated space is still large. Therefore (i) can 
be perceived as an indirect requirement on the surface S. Assumption (ii) appears somewhat 
natural, saying that the bounds converge to p symmetrically. Assumption (iii) expresses the 
idea that any estimator located between p n and p converges to p faster than the bounds. Again, 
it seems intuitive since p n is defined as an incremental average (cf. (O), and therefore adopts 
a smoother behavior than the bounds, as a function of n. 
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Next proposition allows for an empirical estimation of the asymptotic variance and confi- 
dence intervals. The additional requirement (v) appears mild and in the same spirit than the 
smoothness assumptions on S, saying that p cannot be exactly reached by an average of the 
bounds for any finite number n of trials. 

Proposition 3-4- — Denote J n (p) — ^fcti 3 )- Under the assumptions of Theorem 13.11 

and assuming in addition: 

(iv) : Assumption (i) remains true V<5 > 1/2, 

(v) : $ n< oo such that p = (2n) _1 Y2=\ ^(p)(Ph + Pk-\) / ELi &k(p), 
then 

Jn /2 ( P ) 



\JM\ 



J^iM-J-^p)) -=» AA(0,1). (12) 



The reality of the theoretical descriptions hereinbefore is examined in the two next sections, 
through numerical experiments conducted on toy examples and a more realistic hydraulic model. 



4. Numerical experiments I: toy examples 

The statistical behavior of the MLE is illustrated here using the following generic toy exam- 
ple. For a given dimension d, denote 

d 

Z d = MY) = Y 1 /{Y 1 +Y^y i ) 

i=2 

where the physical input Yi follows the gamma distribution Q(i + 1,1) with cdf Fy il inde- 
pendently of other inputs. Obviously, V d > 2, hd is increasing in (— X\, X2 ■ ■ ■ , Xd) where 
Xi = F Y% (Yi) ~ U[ 0<1 ] , and Z d follows the beta distribution B e {2, 2~ 1 (d + l)(d + 2) - 3). There- 
fore, denoting qd tP the p— order quantile of Yd, the deterministic function defined by 

5d(X) ^ h d oT" 1 (X) - q d ^ p , 

with T _1 (x) = (Fy^xi), . . . ,FyJ-(xd)), is related to the known exceedance probability p. 
4.1. First results 

In dimension 2, using p — 5%, the behavior of MRM bounds can be easily compared to the 
MC 95%-confidcncc area (Figure [3]). This small dimension induces a significant improvement 
in precision with respect to Monte Carlo, which however disappears in higher dimensions and 
highlights the need for real statistical estimators. Studies of the root mean square error (RMSE) 
and the standard deviation of the MLE, which are plotted in Figure [5] as functions of the 
increasing number of calls to gd for dimensions 3 and 4, reflected the high variance reduction of 
the iterative estimator p n with respect to Monte Carlo but highlighted a positive bias (Figure 
H]). Indeed the highest weights favor local estimators p/. = pj, when approaching S (ie., when 
£ Xk = 1 in ([7])). On the examples considered in this last figure (as well as in other experiments 
not shown here), a marked gap in relative bias was noticed between dimensions 3 and 4. Under 
dimension 4, the bias remains reasonable from a moderate number of iterations (typically 400). 
Else it dramatically stays at high values. Other experiments have shown on this example the 



ACCELERATED MONTE CARLO UNDER MONOTONICITY 



11 



effective convergence of the empirical variance of the MLE towards the Cramer-Rao bound as 
well as the good behavior of its empirical estimate (Figure [5]) • 

4.2. Bias correction via bootstrap heuristics 

Bias removal appears as a practical requirement, automatizing the estimation of p. Indeed, 
given a finite value of n, estimating p requires to decide from which iteration k n > 1 the 
computation of the MLE p n can be worth it, redefining p n = ^i,nifin)Pi with ipi,n{p) = 

&i (p) I jLfc &3 (p) • An intuitive rule is to select 

k* = argminRMSE(p„) . (13) 

If the MLE were debiased, RMSE (p n ) ~ J~ x (p) which is minimized by fc* = 1. 

Given a fixed number n of trials, two general approaches may be used for controlling and cor- 
recting the bias B n = E[p„] — p affecting p n . A correc tive approach consists of obtaining a closed- 
form expression for the bias from Taylor expansions ( Cox fc Snell L 19681: Ferrari fc Cribari-Neto , 



1998:) or penalizing the score or the likelihood functions ( Bester fc HansenT 20051) . This ap 



proach is not carried out here since the data-dependent context would require a specific alge- 
braic work and technical developments about the empirical estima tion of the main quantities 



involved. The alternative use of bootstrap resampling techniques (lEfron fc Tibshiranil . 119931) 
can assess the bias empirically. For the simplicity of their principle, these heuristics are preferred 
here. 

In the present context, bootstrap experiments must be based on a replication S n of the limit 
state surface S (see Figure [7] for an illustration). Under Assumption 4, S can be interpreted as 
the decision frontier of a supervised classification binary problem, without horseriding of classes 
(ie., perfectly separable). Therefore S n depends on the choice of a classifier C n> M calibrated 
from an arbitrary number M of points sampled in dominated subspaces. The rationale of the 
bootstrap heuristics is as follows. Given C Hj m, the signature £ x of any x € U n can be predicted 
by the occurence of P(g(x < 0)\C ni M) > 1/2- Then denote p n< u the volume under S n . It can 
easily be estimated by p n ,M,Q at an arbitrary precision by Monte Carlo sampling (depending 
on Q). Moreover a large number S of MLE estimators of p n ,M can be fastly computed using 
the predicted signatures. 

The bootstrap heuristics make sense if the classifier is chosen such that p n ,M ~ * V when 
(n,M) — > oo, so that the features of the experiment arc asymptotically reproduced. More- 
over, C n> M must produce a monotonic surface S n . For these reasons, the fou r-layer monotonic 
Multi-Layer neural networks (MLNN) proposed bv lDaniels fc Velikova ( 2010h have been chosen 



for the experiments. Based on a combination of minimum and maximum functions (so-called 
MIN-MAX networks) over the two hidden layers, these networks have universal approximation 
capabilit i es of monotonic continuous functions. Besides, this choice matches the advices by 
Hurt add ( 20041 ) who strongly recommended the MLNN and Support Vector Machines (SVM) 



to estimate S in a structural reliability context. Both tools are flexible, can estimate a frontier 
on the basis of a few samples and overcome the curse of dimensionality. 
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Figure 3. MRM deterministic 
bounds and MLE, with Monte Carlo 
and MLE 95%-confidence areas, 
in dimension d = 2, for p — 5%. 
Empirical estimations are made over 
300 parallel MRM trajectories. 



Figure 4. Relative bias of the MLE 
p n for the dimensions d £ {2, 3,4}. 




Figure 5. Root mean square error (RMSE; left) and standard deviation (right) of 
the standard Monte Carlo estimator and p n for d = 3 and d = 4, for p = 0.05. 
Empirical estimations are made over 300 parallel MRM trajectories. 
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Figure 6. Ratios of MLE standard deviations over Monte Carlo standard deviations, 
computed over 100 MRM replications, in dimension d = 2. 




Figure 7. Two-dimensional situation after n — 14 iterations. A replication S n 
(dashed curve) of S can be produced based on a monotonic neural network prediction 
of £ x in U n . The volume under S n corresponds to a new probability p„ in the 
magnitude of p, which can be estimated by Monte Carlo at an arbitrary precision. 
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Classification-based bootstrap algorithm 

1. Sample x+ = (xj, . . . ,xj^) ~ U v + and x+ = (xj" , . . . , x^) ~ U v - . 

2. From (x + ,x _ ), build a monotonic classifier C„.m of (TU _ ,U + ). 

3. Replace g by the uncostly monotonic (increasing) function 



ff(x) 



-1 if P{g(x < 0)\C n>M ) > 1/2, 
+1 else. 



4. Sample Xi, . . . , Xq ~ Uv and compute p n ,M,Q = Q 1 J2k=i l{g(x k )<o} • 

5. For i=l,...,S, get a MLE estimator p„ M q then estimate B n by 

s 

B n ,M,Q,S = S ~ l ^2,Pn,M,Q ~Pn,M,Q- 
i=l 

Numerical tests in function of n and d were conducted, and the results are presented in Table 
[TJ The bias correction is found to be effective even from a moderate number of iterations (some 
hundreds) until dimension 5, and a budget of at least n = 1,000 is enough to correct a bias in 
dimension 8. With less than 10% of overestimation on average on this example, these bootstrap 
heuristics also appear relevant when the exact value of p is less interesting than its magnitude, 
which is often the case in d esign optimization where i t is aimed to diminish p of a given factor 



by constraining the inputs ([Tsompanakis et al\ . I2007D 



Dimension d 







p = 0.05 






P 


= 0.005 




n 


2 


3 


4 


5 


8 


2 


3 


4 


5 


50 


1.19 


4.17 


6.93 


16.87 


27.85 


8.21 


13.80 


18.22 


39.57 


100 


0.28 


2.31 


4.79 


12.94 


22.98 


6.15 


11.55 


15.67 


31.40 


250 


0.21 


1.87 


3.34 


8.74 


19.12 


3.28 


8.72 


11.01 


24.68 


500 


0.12 


1.25 


2.87 


6.20 


16.76 


1.14 


5.84 


8.12 


16.06 


1000 


-0.02 


0.47 


2.14 


2.97 


12.85 


0.12 


2.72 


5.28 


9.23 


2000 


-0.008 


-0.28 


1.61 


2.08 


7.66 


-0.34 


1.55 


3.09 


6.51 



Table 1. Relative error in % between estimated bias and real bias, for two true 
probabilities p = 5% and p = 0.5%. Results are averaged on 100 experiments, each 
boostrap estimation being based on S — 1,000 MLE replicates. For each n, the 
neural network is build from M = 10 6 sampled vectors, with a classification error 
rate less than 0.25% on these training data. 
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5. Numerical experiments II: a simplified hydraulic case-study 

De Rocquigny (2009). ILimbourg et al. (2010) then iMunoz-Muniga et all (|201ll ) considered 
a simplified but realistic hydraulic model linking the downstream water level H (m) of a river 
section, of width b = 300 (to) and length I = 5000 (to), with the upstream discharge Q (m 3 /s) 
and the friction coefficient K s (to 1 / 3 / ' s) of the river bed. Denoting Z m and Z v the upstream 
and downstream altitude of the river bed above seal level, 

3/5 

Q 



H 



bK f 



-Zv 



Assuming a dike level ha = 55.5 (m), the flood probability is p = P(g' o T _1 (X) < 0) 
where Y = {Q,K S } and T = (F Q ,F Ks ) (2-dim. version) or Y = {Q, K s , Z m , Z v } and 
T = (FQ,F Ks ,F Zm ,F Zv ) (4-dim. version), and 
g'(Y) = h -Z v -H(Y), 

whic h is increasing in {—Q, K X , Z m , —Z„). Input distributions or punctual values are chosen 
as in Limbourg et all ( 2010l) . Q follows a Gumbel distribution with location 1013 and scale 



558, truncated in [10, 10 4 ]. K s is normal A/"(27.8, 3 2 ) truncated in 0. In the 4-dim. version, 
Z m and Z v are triangular on [53.5, 56.5] and [48.5, 51.5] with respective modes 55 and 50 (their 
respective values in the 2-dim. version). 



For several computational budgets and averaged over 100 repeated experiments, two al- 
ternative methods are compared to the MRM bias-corrected MLE: the MC method and an 
engineering FORM-IS method build on two steps: (a) with a limited number of trials (no more 
than 40), the First-Order Reliability Method (FORM) is run to provide an estimate of the 
conception point (3 = argmin ||u|| on {g' o F^ 1 o $(u) < 0}, with $ the standard normal pdf 
and u a random variable evolving in the d— dimensional standard Gaussian space U ; (b) an 
Importance Sampling (IS) method th at uses the budget left to sample in U using a standard 
normal distribution centered on (3. See Anonymous! ( 2011 ) for details about the implementation 
of the method. 

For a given n, the three methods are compared through the following indicators: E[p„], 
CV[p„] and the relative average precision 7„ = E[(p+ — p~)]/p. Using the DOEs produced 
by the MC and the FORM-IS methods, these bounds can obviously be computed accounting 
for the monotonicity of g. For each version a MC computation involving 40,000 particules 
provides a precise estimate of p, which is used for estimating p in j n . Finally, S = 1,000 
bootstrap replicates were used for the correction of each MRM-MLE estimate. The results are 
summarized on Table [2] 



In terms of magnitude, the three methods perform similarly. The benefit of using MRM 
instead of MC or FORM-IS in these low dimensions clearly appears in most cases, and more 
obviously in dimension 2: MC needs at least 200 times more iterations than MRM to reach 
a similar precision CV[p n ], and if FORM-IS is significantly better than MC, the precision of 
its estimates remains far beyond of those produced by MRM. In dimension 4, the difference 
between these two methods somewhat vanishes and they lead to close performance when the 
number of iterations remains low. For both dimensional cases, it was noticed that a single 
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FORM run can provide a crude estimate of p with good magnitude after 10 iterations only. 
But the dimensional increasing allows the part of the importance sampling falling into the 
non-dominated area to be greater than in a two-dimensional setting. 



n 


method 


dimension = 


2 


dimension = 


4 






E[pn] 


CV[p„] 




E[Pn] 


CV[j3„] 




100 


MC 


0.002775 


190% 


2,900% 


0.010075 


99% 






FORM-IS 


0.002241 


68% 


478% 


0.018147 


74% 






MRM 


0.002781 


14% 


48% 


0.015498 


82% 


1,400% 


200 


MC 


0.002775 


134% 


630% 


0.010075 


70% 


2,300% 




FORM-IS 


0.002667 


44% 


244% 


0.010242 


42% 


2,230% 




MRM 


0.002776 


6% 


24% 


0.012451 


35% 


800% 


1,000 


MC 


0.002775 


60% 


515% 


0.010075 


31% 


1,200% 




FORM-IS 


0.002736 


27% 


168% 


0.009959 


27% 


1,000% 




MRM 


0.002775 


0.12% 


5.6% 


0.010911 


20% 


300% 


40,000 


MC 


0.002775 


9.5% 


475% 


0.010075 


5% 


247% 



Table 2. Estimation results for the two-dimensional and four-dimensional versions 
of the problem. 



6. Discussion 

Many structural reliability problems deal with the fast estimation of a probability p of an 
undesirable event. This event can often be defined by the occurence of an exceedance in output 
of some time-consuming function g with stochastic multidimensional inputs. In the present 
article, g is assumed to be mon otonic and possibly n on-c ontinuous. 

Pursuing pioneering works bv lde Rocauignvl (|2009h and lLimbourg et al. I (<2010h who explored 



hcuristically the benefits of this framework, this article first offers a formal description of the 
latter that focuses on the existence of deterministic bounds around p. A sequential strategy of 
numerical experiments in the input space allows for a progressive narrowing of this interval. The 
second and main aspect of the paper is the definition and the study of a statistical estimator 
of p when the strategy becomes stochastic and leans on uniform nested sampling. Easy to 
compute, it is defined as the maximizer of a likelihood (MLE) of dependent data sampled from 
Bernoulli distributions, whose parameters are explicit functions of the dynamic bounds. 

A keypoint of the paper is the theoretical description of its asymptotic properties, which 
are found similar to those arising from the classical estimation theory, provided some intuitive 
assumptions are respected. They are found mild in practice on some examples. Both theoretical 
and applied results show a significant improvement of the fastness and the robustness of this 
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estimator with respect to the usual Monte Carlo estimator. In the third part of the paper, 
boostrap heuristics are proposed and carried out successfully to remove the non-asymptotic 
bias affecting the MLE, via constrained neural networks. Only a basic continuity assumption 
on the limit state (or failure) surface is needed to benefit from their universal approximation 
capabilities. 

Thus, the tools proposed in this article and its supplementary material in Appendix can be 
directly used in structural reliability applications, without preliminary learning step (as usual, 
for instance, in stratified methods). However, the generality of the frame allows for a wider 
range of theoretical and applied studies. These research avenues arc briefly discussed in the 
following items. 



Bias correction. — Following Hurt ado] ( 2004 ). support vector machines (SVM) should prob- 



ably be considered instead of neural networks, since their geometric interpretation of margin 
maximizers appears more suitable. In addition to the monotonicity constraint, they should 
be build at step n under the linear constraint that the volume under the predicted surface be 
equal to the current (biased) estimator p n . This would certainly improve the properties of the 
bootstrap heuristics. More importantly, this method should be now tested on a large variety of 
examples, and the intuitive feeling of its ability to correct the bias must be confirmed by more 
applied and theoretical studies. 

In parallel, future studies should focus on adopting a corrective approach to the bias affecting 
the MLE, then on selecting a slippery window of indexes, according to (|13|) or a similar rule, 
such that the MLE converges faster to p. The comparison of the experimental benefits of both 
approaches would help the method to become more ready-to-use. 

Simplifying the assumptions. — Most of the technical assumptions that are needed to get the 
theoretical results present some intuitive features, and are underlyingly linked to the nature of 
the limit state surface. However, they remain difficult to check in practice, although asymptotic 
normality was always noticed in numerical experiments. Therefore, future work should be 
dedicated to simplifying those assumptions and classifying the limit state surfaces in function 
of their ability to allow a fast and robust estimation of p. 

Sensitivity studies. — Crucial tasks in structural reliabil i ty are sensitivity studies of proba- 
bilistic indicators to the uncertainty input model ( Moriol 2011 ). Therefore, assuming X = 



T(Y) where the Y represent physical inputs with multivariate distributional transform T(y = 
(Fi(yi), . . . , Fd{yd)) (each Fi being the marginal cdf of Yi), given a budget n, the variations of 
(Pn-iPn-iPn) due to modifying T in T e should be the subject of future works. As a supplementary 
benefit of the method, the new values (p^ , p~^ , p n ,e) , for k G {0, . . . can be recomputed 
without any supplementary call to g, thanks to an importance sampling mechanism. Indeed, 
as the subspaces (U^" ) remain dominated whatever the choice made on input distributions 
in the physical space, then 



Pk,e = I dT e (y) and p\ f = 1—1 dT e (y), 
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which can computed by a simple Monte Carlo method. In such future studies, we suggest that 
the progressive bounds could be defined as robust if they remain true whatever the fluctuations 
of F e in a well-funded variational class around F. 

Exploring other forms of stochastic DOEs. — A kcypoint of future works will be to elaborate 
unbiased estimators from sequential stochastic designs of experiments with non-asymptotic 
properties. Indeed, the asymptotic variance of the MLE reaches the Cramer- Rao bound J~ 1 (p). 
Therefore any unbiased estimator based on sequential uniform sampling, especially those defined 
by Pn = Sfc=i ^kPk where the u>k are now deterministic weights, independent on p and summing 
to 1, will never reach a lower variance than J^ 1 (p), even though the uik are optimized. Improving 
the Monte Carlo acceleration nJ~ 1 (p)/(p(l— p)) will only be possible using less naive strategies 
than uniform samplings. The problem of defining such samplings so that an unbiased estimator 
of p has better statistical properties will be the subject of a future paper. 

Towards partial monotonicity. — Finally, the practical limits of monotonicity assumptions 
should be refined. Intuitively, monotonicity as a building hypothesis seems antagonist to high- 
dimensional structural safety problems, and could mainly characterizes the behavior of g as 
a function of its most influential input var iables (as dete r mined by global sensit i vity a naly- 
ses). Indeed, the real exa mples treated bv Ide Rocauignvl ( 20091) : Limbourg et al. ( 2010h and 
Raiabalineiad et al. (12011 ) do not go beyond dimension 4. Partial monotonicity, as defined by 
Daniels fe Velikoval (|2010l) . is a more appealing and realistic property, for which the methods 
developed in a pure monotonicity context should be adapted in the future. 
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Appendix A 
Proofs 

Proof of Lemma [2l — An infinite uniform sampling on U provides on the open sets 

(U~,U+) two topologies constituted by the collections of open subsets TUq , . . . ,Un, ■ • ., and 

Uo , . . . ,Un, Hence the sequence (U~,U+) define two covers (exhaustions) of (U~,U + ). 

Then 

CO 

U x = (J U k 



IT 



U+ = \J Ui = U+ 



k=0 



k=0 



and lim„_ ) . 00 ;j~ = lim„_ > . o0 P(X G U~) = P(X £ U^) = p by inclusion. Similarly, limp+ = p. 
Furthermore, given p^ and Pq, p~ and 1 — pt are T n \— adapted submartingales bounded in 
Vp > 1. Then, from Doob's theorem (j Meyer , Il972h . the bounds converge almost surely to 
p. □ 

Proof of Proposition I3TT1 — One may write I'lXv) — Sfe=i &k (p) &k (p) with 

Sk(p) = -l + (Pk-p)u k {p)(2p-p- k _ l -p+_ l ) 1 (14) 
= -u) k (p)(p~p k ) 2 - 

Hence ££(p) < in (p~_ lf p+_i). Besides, lim^-^ ^(p) = oo and lim p ^ p +_ i ^(p) = -co. 
Hence, by twice continuity and differentiability of £ n (p), the mean value theorem implies the 
existence and unicity of a MLE p n in ]p^_i)P„_i[- ^ 

Proof of Lemma [3] — We shall proceed by induction. Since p^ < p < p^ , J5]) and ([7]) hold 
for n = 0. Denote rj n = l/(p — p~) 2 . For n > 1, it is assumed that E[ij n ] < oo. Then 



E [ ?7 „ +1 ] = E [r, n E [1 - £x„ +1 1 J- n ] ] + E 



E 



fx»+l/ I V - P n - Vol x„+l ) I- 7 """ 



with Vol^ n+i = J v nU _ l{ x -<x [1+ i} <^ x the additive volume of formerly non-dominated failure 
points in U n that are now dominated by the failure point x n+ i. By hypothesis, the first term 
is always finite. Furthermore, with 



Vol Xn+1 < sup 

x„GU„nu~ Ju„nu- 



l{x^x„} dx 



sup 

x n e5^u„nu- 



l{x^x n } dx, 



d2|) implies that Vol Xn+1 < p — p n ■ Since p n = p n _ 1 +Vol Kn , etc., one has J2k=i Vol Xk < P~Pq 
Then 



E 



E 



£x n+1 /(p-p„ - Vol Xn+1 ) \F n 



= E 



n+l 



k=l 



< OO. 



The same rationale applies to l/(Pn ~ P) 2 1 by symmetry, since p^ +1 = p^ — Vol x 1 with 
Vol x„ +1 = /u„nu+ !{i-xr<i-x n+1 } dx. □ 
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Proof of Proposition mi — One has E[£' n (p)} = J2k=i E [&k(p)E ]p k - p^k-i]] = since 
U) n +i depends only on F n , hence the Fisher information J n (p) = Vax[£'^{p)] = E[^(p)] is equal 
to — E[£' r [(p)} by twice differentiability and continuity of £ n {-), similarly to a classic iid. case. 
Assumption 4 implies that Vn < oo, p^-i < P < Pn-i: * e - P cannot be reached in any finite 
number of iterations, so that these quantities are well defined. With —S n (p) = ui n (p)(p — p n ) 2 
V n > from {H}, 



fc=i 



(Pn-l -P„-l) 2E Kx„|^-l] - (P-P„_l) 2 , 



J n {p) = £E[w2(p)Var|p fc |.F fc _ 1 ]] = ^E[^(p) 
fe=i 

since 

Var [p n |J" n _i] = 

= (Pn-1 - Pn-l) (p - Pn-l) ~ (p ~ Pn-l)^ , 

= ^(p)- (15) 
Inequality @ is a simple consequence of Jensen's inequality: since E _1 [cl^ 1 ^)] < E[wfc(p)], 

P(l -p) 



then J" 1 (p) < ^E- 1 ^- 1 ^)] 



□ 



E(i-^-i)- 1 



fc=i 



Proof of Proposition [3731 — Using the notation Skip) defined in ([14 



UP) = - 

with 

Jn{p) = E 



E 



p(l -p) 



^ Wfc (p) Sfc (p) 



nV^{p) £j 

n 

5^j>(i - p) (pt-i - p) 2U ^ 2 (p - Pk- 



-1 Jnip) 

v n MC ( P ) 



-2£* 



.fe=l 



which can be rewritten as 



Jn(p) 



k=l 



6, 



P(l -P) 



E 



P(l - P) 



(P-Pfc-J / J L \(Pfc-i -J»)' 

Since p" 1 ((?>£_ l ~p) + (p~ Pfc-i)) < Pk-i, then 

pi-i-P < p{Pk-\ - 1) +Pfc-i < P(Pfe-i -l)+P = PPfe-i, 
P-Pk-i < p(Pk-i + 1) -Pfc-i ^ P(Pfe-i + !) -P = PPk-i- 
Hence 



p(l-p) > l-p ^ 



P(l -p) > 1 -P 



(P-Pfe-O ^- 
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Consequently, 

1-p 



Jn(p) > 



P 



fe=i 



i-p 

> -raE 



since (p n ) is a strictly decreasing positive process. Since {p n ,p^) are predictiblc processes, p 
and Pq are deterministic quantities, then 



E 



1 

7 



Po 



pi) -Po 



l/7o, 



and J n (p) > n (h^J which proves ([TP]) . 



□ 



Proof of Theorem 13.11 — Given the stro ng consist e ncy o f p n , it s asymptotic normality can 
be established using arguments studied by ICrowderl (|1975l [1983). Showing that £' n (p) is a 
J-'n—l— adapted martingale is a classic result: 

E K+l(p) - CWI^n] = W n+ i(p)E [p n+ i -p|J" n ] = 0. 

Furthermore J n {p) < »iE[w n (p)] < oo under Assumption 4 (cf. Lemma [3]) Hence ^(p) is square 
integrable. Denoting A n (p) = ^(p) - ^_ x (p), then A 2 (p) = o) 2 (p)(p„ -p) 2 and 



E [A* (p)!^., 



w 2 (p)Var[p„|J"„ 



w„(p). 



Then < £'(p) >„= J2k=i^k{p) denotes the increasing (or bracket) process of £' n (p)- The proof 
can be achieved in three steps. 

1. — With J n (p) = E[< £'(p) >J and lim„_ ! . 00 J n (p) = oo from (flu]) , establishing asymptotic 
normality first requires to prove the following law of large numbers (LLN) 



M n {p) = J- 1 ^) < £'(p) >„ -1 A 0. 

Denote W n {p) = ££ =1 (a> fc (p) - E[w fc (p)]) Theil > Ve > 0, 

P(|M„(p)|>e) = P(J- 1 (p)\W n (p)\>e), 



(16) 



P70 I 



W„(p)|>e) from®, 



< P 



|W n (p)|>e' with e' = e/(p 2 7o ), 



which tends to under (i) and proves (|16p . 
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2. — For all k G {l,...,n}, denote = J n 1 (p)|Afe(p)|. The second requirement of 

asymptotic normality is proving the following Lindcbcrg condition: Ve > 0, 

^EE^I^jM ^ 0. (17) 

A Lyapunov condition is often used instead of (|17p . but requires 2+5-order moment assumptions 
on Cj k (p). An alternative approach is the following. From Markov's inequality and since the 
Cjkijp) are increasing functions of k, 

P(T M >6|^_0 < -|#, < ^ {P) 



e 2 J n (p) e 2 J n (p)' 
It follows from (Tl6l) that 



W n (p) , 1 - / n. &„(p) , f Jn-l(p)\ (< Z'(P) >„_! 



Jn(p) ^ ^n(p) V ^"(P) / V ^n-l(p) 

1. 

n— >-oo 

However, by Lemma [31 E[tD n (p)] < oo which means that J n (p) ~ Jn-i(p)- Necessarily, 
ti n {p)/Jn(p) > and 

i fc ,„ = E[l { r fcn > e} |Jfc-i] ^— > 0. (18) 

Note besides that 

E [A^(p)l{ rfc n>c j|J r fc _i] < E [AUp)^] E [l { r fc ,„> £} | ^-1] + |Cov [A 2 k (p), l{r fc ,„> e }|.F*-i] | 



< u k {p)L Kn + Q 2 k {p)^fV'AT [{p k - pf\ Jfc-i] J Var [ 1 {r fe ,„>e}|^fc-i] 



from Cauchy-Schwarz inequality. Since Var[A ] < E[X] when A G {0, 1}, then y Var[l{p fc „> e } l-^fe-i] < 
y/Lk~n- Furthermore, denote 



K k , n (p) = ^fc(p)v /Var [(Pfc -P) 2 l-^fc-i]- 

From Lemma HI below and under (ii), then Kk, n (p) — > 0. Therefore, one may write 
E[Ag(p)l {r >e} |.F fc _i] < w fe (p)/3 fe ,„ 



with /3 fc ,„ = L fc ,„ + K ktn (p)^/L k ,n — ^ from (JTSJ). Then 

n 



fe=l 



ACCELERATED MONTE CARLO UNDER MONOTONICITY 



25 



and given (|16[) . Tocplit z lemma pro ves (|17p . Finally, (|16p and (|17l) prove the two martingale 
central limit theorems ( Bercul . 12008 ): 



Jn 1/2 (PK(P) 



£ 



71— >-00 



* AT(0,1), 



< £'(P) >r 



> A/"(0,1). 



(20) 
(21) 



Lemma 4- — If 3 such that < 700 < 00 and Pn E -^-> 700, then, Vfc > 1, 



Var [u> k (p)(pk -p) 2 \Fk-i] 
Proof. — One may write 



p 



loo + I/700 - 2. 



2 P ~ Pk-i Pk-i ~ P 

Uk{p){Pk-p) = (l-^x k )-C — +£x k - 



CJfe 



(P){(PiLl 



P-Pk-l 



P) ~ [P ~ Pk 



:-r) 2 } 



FFii/i £ Xk ~ B(7fc) and from Q), i/ien 

\2 |T - 1 



P-Pk-l 
Pfc-l - p 



Var [u)fc (p) (p fe - p) 2 1 J" fc - 



(Pfe-i-Pfc-i) 2 
w fe (p)(pti+Pfc-i- 2 P) 2 > 



[(Pfc-i +Pfe-i - 2 P)(Pfc-i -Pfe-i)] 2 . 



Pfc-i-P , P-Pfe- 



P-Pfc-i Pfe-i-P 



A -2 A 7oo + l/7oc-2. 



3. — A last condition is required to transfer the asymptotic normality from £' n (p) to (p„ — p). 
Since p~_ 1 < p n < p\_ 1 , for any n there always exists an open neighborhood Vp n of p containing 
p n . From twice differentiability of £ n (') and continuity of £'„(■) in Vp n , the mean value theorem 
implies there exists some intermediate point p n € Vp n between p and p n such that 

C(Pn) = = £' n (p) + (Pn-p)C(Pn) 



and moreover p n 



p. Thus, with C(p n ) ^ 0, 



(P«-p) = C(p)(-^(Pn)) 

and it is necessary to prove the LLN 

C(Pn) JP 
< £'{p) >n 



(22) 



(23) 
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to obtain the final result (Theorem 3 in Crowder ( 19831 )). combining ([^)) with (fT())) and (f^Uj) . 
Based on (iii) this last LLN is straightforward. Indeed, Vfc < n, 



Pj - Pr. 



- 1 



Pk -P 



\Pn ~ P\ < \Pn - P\ JP^ Q 
Pi - P ~ Pn ~P 



Similarly (p„ - p k )/{p - p k )-l 



7fc+i,' 



(Pn -Pfc+l) 2 £fc+l(?>n) 



fe— »-ra— ^oo 
IP 



» 0. Withp/,+1 G ,_p^} then, for fe G {0, ... , ra— 1}, 



fc^-n— )-oc 



-> 1. 



Furthermore, some calculus proves that Kfe : „ = Wfc.f i (p n )/o)fc+i (p) 

-^(p) = ELiA2(p), 



-> 1. Then, with 



< *'(p) >, 



X) &k(p)Kk,n1k,: 

k=l 

n 



-> 1 from Toeplitz lemma. 



□ 

Proof of Proposition [3751 — Using the notations of the previous proof, note that J n {p) =< 
i'ip) > n - By twice continuity and derivability of J n ~ 1 (-) in a Taylor expansion gives 

J n -\Pn) = J n - 1 (p)-S^-(p n ^p)(l+o(l)). 

(P) 



After some calculus, 



with i? n = yJJ n (p)/Jn(p) ^ 1 from (HU), £/„ = sgn{J' n (p))^JJp){p n -p)(l + o(l)) 
W(0, 1) from Theorem [O and 



Jn 3/2 ( P ) / J„(p) 

l#(p)l V J «(f) 



- 1 



Thanks to Slutsky's theorem, it is enough to show that Z n — > to prove the statement of the 
proposition. Notice that 



j» - j2^(p){^p-{pi-i+Pk-i)} 



k=l 
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which is always nonzero assuming (iv). Holder's inequality gives 



E ^fe(p) 
fc=i 



< 



E££(p){2p-(pti+fc)} 

E^-^-i+^-r)} -1 



E wfc(p) 
fc=i 



hence 
J» 3/2 (p) 

l4(p)l 



< 



' E ^0) 
fe=i 



Jr t 3/2 (P) 



< 



Another Holder's inequality gives 

\J'niP)\ 2 P-(pt-l+Pk-l) 

shows that each term of the sum is stricly smaller than 1. Then 



\Z n \ < y/ri 



J nij>) 



Jnip) 



- 1 



^ %£0 5 *(p)- e [°*(p)]) 



from (Unjl, then Z„ ^> if (i) remains true V<5 > 1/2. 



□ 
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Appendix B 

Supplementary Material 

This supplementary section first provides details about the implementation of sweepline 
algorithms to solve Klee's measure problem, which allows for an exact computation (modulo 
rounding errors) of the probability bounds (jPniPn) > a pseudo-code is given for direct use. 
Then a general result is given about the preservation of monotonicity when the uniform input 
x = (xi, . . . , Xd) results from an inverse transformation of the joint cdf. 



B.l. A sweepline algorithm to compute volumes of hypercubic unions 

Sweepline (or plane s weep) algorithms are common ly used to jointly detect and sort intersec- 
tions between segments ([van Leeuwen fc Wood , Il98lh . The d-dimensional volume is calculated 
recurs i vely by exploring all n -l-di mensiona l "slice s" of the d-th dimension. See lShamos &: Hoev 



l|l976t k IdeBerg et al 



19971) and IChlebud (|l998h for more explanations. When segments are 
parallel or perpendicular such as their intersections define a union of hypercubes s haring the 
same orthogonal b asis, the volume calculation is known as Klee's measure problem (jErickson . 



1998 ; Chan . 200&f ). A pseudo-code follows to be used for direct implementation. 

Let A n be the n x d matrix of n vertexes (xi, . . . , x n ) defining the union of hypercubes (for 
an example, see Figure [T]). In the following pseudo-code, the volume considered is V~, also 
defined by the points of A„ and the origin (0, . . . , 0) of the U— space. 



Algorithm V0L(A„,n, d). 



1. Let A' n = <J n ,d{^n) be the nxd permutation of A n arranged in the increasing 
order of the n— vector of d— dimensional components. 

2. Remove the d— dimensional components from A^ and denote Vol„ = 0. 

3 . For i g {1, . . . , n} , 

(a) Consider the slice A$ = {x£, ...,x' n & A' n } . 

(b) Denote Vol ra the d — 1— dimensional volume of A^ . 

If dimZ^ = 1, 

• A$ is a ?i — i + 1— vector and Vol^ = max{x G A^}; 

• force i to the index of this maximal component in A^ ; 

else Vol„ =WL(A$,n-i+l,d-l). 

(c) Let Aj = A' n [i, d] - A' n [i - 1, d] the size of A$ (assuming A'„[0,d]=0). 

(d) Compute Vol^ = Aj • Vol n the d— dimensional volume. 

(e) Update the total volume Vol„ = Vol n + Vol^ . 



In practice, this algorithm seems to remain little used for dimension d larger than 2 or 3. 
This is not surprising because its complexity C{n,d) (the number of runs for a d-dimensional 
hypervolume between n points) is 0(n d ). This appears when considering the first developments 
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of C(n,d): 
C(n,d) = 



n-l 



n-1 



J2c(n~ k,d- 1) = ^(fc + l)C(n- k,d-2), 

k=Q k=Q 
n-l /k-1 \ 

E \Y,P )C(n-k,d-3), 

k=0 \p=0 / 

'g ' t + 1 f + 2 » C (n- M -3), 

fc=0 



Note however than the fastest version of this algorithm, proposed by Overmars fc Yap! ( 199lh . 
runs in time 0[n d l 2 logn) for d > 3. An alternative approach was presented bv lChlebusI (1998) 
with the same asymptotic performance, although its exposition was restricted to dimensions 3 
and 4. At the present time the computational difficulties raised by diminishing the co st still 
remai n open problems, although some slight improvements have recently been found by IChan 
(2008). Some ideas of possible improvements could possibly come from a parallel with multi- 
objective optimization contexts (cf. Remark 1 in the article). Indeed, algorithms running in 
polynomial time Q(n kl d k2 ) to com pute hypervolume metrics of Pareto frontiers have already 
been proposed by Fleischer (|2003l) . 



B.2. Preservation of monotonicity through space transformation 

Consider g a monotonic function with physical input random vector y = (yi, . . . ,yd) and 
denote T their multivariate distributional transform. The methodology proposed in the article 
applies using the transformed function g = g o T" 1 , provided T _1 is a globally increasing 
function of independent uniform inputs x = (x\, . . . , x^). This is ensured when (j/Xj ■ • ■ , yd) are 
independent, since T- 1 = (Tf \ • • • , Fd 1 ) where F i is the * th 

marginal cumulative distribution 
function (cdf). In dependent cases (and possibly when the ph ysical inputs mix co ntinuous and 
discrete distributions), the generalized Rosenblatt's transform ( Riischendorl . 20091 ) may be used 
if the inputs can be stochastically conditioned, namely they can be sorted to get the explicit 
writing 

d 

T(yi,...,y d ) = F 1 (y 1 )Y[F l \ 1: ...^ 1 (y l \y 1 ,...,y t - 1 ). 

i=2 

Under this assumption, next lemma provides an intuitive sufficient condition for F^ 1 to be an 
increasing function of all Xi ~ U[0, 1]. 

Lemma 5. — Assume that for i = 2, . . . , d, there exists a mapping and a set of (possibly 
random) parameters 0i independent of Yi, . . . , Yi such that: 

(i) : Y t =MY 1 ,...,Y l _ u 6 l ), 

(ii) : fi is a globally increasing function of Yi, . . . , li-i; 
then T _1 (x) is an increasing function of x. 

Multivariate normal distributions are often selected as approximate ways to tackle the 
difficulties of assessing correlations between input physic al paramete rs, and therefore deserve 
particular interest in the field of computer experiments. If lChen (120091) obtained general results 
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about the preservation of monotonicity when these distributions are given under the form of 
Gaussian copulas, an immediate corollary of Lemma [5] is to notice that any standard binormal 
input distribution with positive correlation coefficient pt ensures that T -1 (x) is increasing. 
Indeed, Y = (Yi,Y 2 ) where Y 1 ~ W(0, 1) and Y 2 = \iY x + - n 2 9 with 6 ~ Af(0, 1). A 
similar result can be found for the class of elliptical bivariatc copulas. 

Proof of Lemmal — Assume (i). Vi G R, Vfc e {2, . . . , d}, denote p\. (Yi, . . . ,Yi-i) = 
P(fi(Y 1 ,...,Yi- 1 ,9 i ) <t\Y 1 ,...,Y i - 1 ). Then, Vz 6 M, let A Yi ^ Y ._ x {z) denote the event 
{p g . (Yi, . . . , Vi-i) < z}. By definition, 

fi..,wW y i yw) = M{tGM\p(A^ !Yii (z)) = l], 

Assuming (ii), p e . (Yi, . . . , Yi-!) is a globally decreasing function of Yi, . . . , Yi_i. Thus, given 
t, the occurence of event A Y Y-xiv) similarly decreases. Necessarily t increases, hence the 
minimum value of all t € M such that V{A Yl y-S z )) = 1 increases. Hence i _ 1 is a 

globally increasing function of Y x , . . . , Yi_i, Vi G {2, . . . , d}. Since Yi = F _1 (Xi) is naturally 
an increasing function of X\, a simple recursive reasoning shows that . ^ is an increasing 

function of Xi, . . . , The statement of the lemma follows. □ 
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